pacman::p_load(olsrr, ggstatsplot, sf, tmap, tidyverse, gtsummary, performance, see, sfdep, spdep, tidygeocoder)TH3a: Take-Home Exercise 3 post-feedback
Take Home Exercise 3a
TH3.1 Introduction
This page is a reimplementation of Take-Home Exercise 3 in response to feedback from Prof. Kam Tin Seong, in preparation for the submission of the group project. The objective of Take-Home Exercise 3 is to prototype a module to be built for the Geospatial Analytics project, which will serve to analyse housing prices in Johor Bahru.
Since separating from Malaysia in 1965, Singapore has seen a rapid development in its economy and quality of life the likes of which no one else has seen. So it made sense for the neighbouring city of Johor Bahru, Malaysia, to capitalise on the spillover effects of Singapore’s prosperity by itself becoming a lifestyle hub for visiting Singaporeans, and an emerging business satellite for companies from Singapore and the region. New malls have sprouted all over the state capital, and new investments were made by the state and federal governments to boost the city’s economy, such as the development areas of Iskandar Puteri and Pasir Gudang, the Johor Bahru-Singapore rapid transit link and the Iskandar Puteri stop on the since-shelved Kuala Lumpur-Singapore High Speed Rail, among many others. In particular, Johor Bahru became a place where Singaporeans could buy houses at prices cheaper than in their hometown - even more so when housing prices in the country began to fly through the roof in recent times. Even while this phenomenon led to further economic growth, there were concerns that the presence of these Singaporean buyers would put inflationary pressure on the city.
Admist all this, this project aims to explore the trajectory and factors influencing housing prices in Johor Bahru. Using the measures of local and global spatial autocorrelation and a hedonic pricing model, we will perform geospatial data analysis on property transaction data in Johor Bahru and its northwestern neighbour Kulai. As the group member in charge of exploratory data analysis, this take-home exercise serves to prototype the module for displaying the measures of spatial correlation, including Local Moran’s I, LISA and the Gertis-Ord \(G^*_i\) statistics. In addition, geocoding and simple feature engineering tasks will be performed to ensure the dataset is ready for machine learning use going forward.
The scope of today’s exercise will be as follows:
- Load and clean the geospatial and aspatial data required for the analysis
- Geocode the aspatial property transaction data
- Prepare the aspatial data for machine learning by performing simple feature engineering tasks
- Create a hexagonal grid of the study area prior to performing analysis
- Perform exploratory data analysis
- Derive the local measures of spatial autocorrelation, and perform hot/cold spot analysis
Screenshots from the Shiny UI corresponding to each step will be displayed as it is performed in this exercise.
TH3.2 Loading data
Today’s exercise will involve going through the entire data acquisition process from start to finish. This includes sourcing and cleaning the data before it is used for analysis.
TH3.2.1 Loading geospatial data
We will be using Malaysia’s municipality polygons, sourced from the United Nations Humanitarian Data Exchange.
adm3_my <- st_read(dsn = 'shiny/data/geospatial/myadm3', layer = 'geoBoundaries-MYS-ADM3') %>%
st_transform(3377)Reading layer `geoBoundaries-MYS-ADM3' from data source
`/Users/kendricktty/Gits/smu_cs/is415-site/TakeHome/TakeHome3/shiny/data/geospatial/myadm3'
using driver `ESRI Shapefile'
Simple feature collection with 1859 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 98.93646 ymin: 0.8538234 xmax: 115.6782 ymax: 6.726112
Geodetic CRS: WGS 84
qtm(adm3_my)
The overall shape of the plot is as above, corresponding to the geography of Peninsula (West) Malaysia, as well as the eastern states of Sabah and Sarawak.
TH3.2.2 Loading property transaction data
property_transaction_data <- read_csv('shiny/data/aspatial/Open Transaction Data.csv')New names:
Rows: 20517 Columns: 14
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(12): Property Type, District, Mukim, Scheme Name/Area, Road Name, Month... num
(1): Land/Parcel Area lgl (1): ...14
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `Unit` -> `Unit...9`
• `Unit` -> `Unit...11`
• `` -> `...14`
head(property_transaction_data)# A tibble: 6 × 14
`Property Type` District Mukim `Scheme Name/Area` `Road Name`
<chr> <chr> <chr> <chr> <chr>
1 1 - 1 1/2 Storey Semi-Detached Johor Bah… Band… KAW ABDUL SAMAD/M… JLN ISMAIL
2 1 - 1 1/2 Storey Semi-Detached Johor Bah… Band… KAW ABDUL SAMAD/M… JLN ABDUL …
3 1 - 1 1/2 Storey Semi-Detached Johor Bah… Band… KAW ABDUL SAMAD/M… JLN YUSOF …
4 1 - 1 1/2 Storey Semi-Detached Johor Bah… Band… KIM TENG PARK JLN BERLIAN
5 1 - 1 1/2 Storey Semi-Detached Johor Bah… Band… KIM TENG PARK JLN UNGKU …
6 1 - 1 1/2 Storey Semi-Detached Johor Bah… Band… LARKIN GARDEN JLN DATO J…
# ℹ 9 more variables: `Month, Year of Transaction Date` <chr>, Tenure <chr>,
# `Land/Parcel Area` <dbl>, Unit...9 <chr>, `Main Floor Area` <chr>,
# Unit...11 <chr>, `Unit Level` <chr>, `Transaction Price` <chr>, ...14 <lgl>
TH3.3 Data Preprocessing
TH3.3.1 Filtering out Mukims outside Johor Bahru
We are only concerned with Mukims within Johor Bahru and Kulai district - the latter of which contains Johor Bahru’s main Senai international airport. Anything else will need to be filtered out.
johor_kulai_mukims <- c("MUKIM JELUTONG", "MUKIM PLENTONG", "MUKIM PULAI", "MUKIM SUNGAI TIRAM", "MUKIM TANJUNG KUPANG", "MUKIM TEBRAU", "BANDAR JOHOR BAHRU", "MUKIM BUKIT BATU", "MUKIM KULAI", "BANDAR KULAI", "MUKIM SEDENAK", "MUKIM SENAI")
adm3_jb_kulai <- adm3_my %>%
filter(shapeName %in% johor_kulai_mukims)
tmap_mode('view')tmap mode set to interactive viewing
tm_shape(adm3_jb_kulai) +
tm_polygons()As we can see, there are still some stray polygons located very far north from Johor Bahru and Kulai proper that we have to remove. A quick inspection of the dataframe in memory indicates they are associated with the Mukims of Pulai and Jelutong, and carry the IDs 4, 5, 6, 16.
The code chunk below will break up the multipolygons into simple polygons, and remove the strays. The result is the correct, contiguous borders of the Mukims in Johor Bahru and Kulai districts.
broken_up_jb <- adm3_jb_kulai %>%
st_cast("POLYGON")Warning in st_cast.sf(., "POLYGON"): repeating attributes for all
sub-geometries for which they may not be constant
stray_polygons <- c(4, 5, 6, 16)
adm3_jb_kulai <- broken_up_jb %>%
filter(!row_number() %in% stray_polygons)
tm_shape(adm3_jb_kulai) +
tm_polygons() +
tm_basemap("OpenStreetMap")write_rds(adm3_jb_kulai, 'shiny/data/rds/adm3_jb_kulai.rds')TH3.3.2 Geocoding Property Data
The raw dataset encodes information about street names, but we need latitudes and longitudes to plot their locations on a map. Therefore, it would be necessary to geocode the data. A dataset as large as ours took at least 2 hours to geocode, so it would be useful to save it as an RDS for later.
property_transaction_sf <- property_transaction_data %>%
mutate(Address = if_else(
!is.na(`Road Name`),
paste(`Road Name`, District, sep = " "), # Concatenate "Road Name" and "District" if "Road Name" is not NA
paste(`Scheme Name/Area`, District, sep = " ") # Else, concatenate "Scheme Name/Area" and "District"
)) %>%
geocode(address = Address, method = "osm", lat = "latitude", long = "longitude")
write_rds(property_transaction_sf, "shiny/data/rds/property_transaction_sf.rds")I learned the hard way to encode all longitudes and latitudes with WGS 84 first before transforming it into my desired coordinate system.
property <- read_rds("shiny/data/rds/property_transaction_sf_notnull.rds") %>%
st_as_sf(coords = c("longitude", "latitude"), crs = 4326) %>% st_transform(3377)TH3.3.3 Simple Feature Engineering
Finally, to make our dataset compatible for machine learning use, we need to engineer features based on the two categorical variables we want to ingest: the Property Type and tenure.
Since Property Type is multiclass categorical, we need to perform one-hot encoding. The following code chunk performs one-hot encoding on the Property Type feature.
dummies <- model.matrix(~ `Property Type` - 1, data = property) %>% as.data.frame()
colnames(dummies) <- gsub("Property Type", "", colnames(dummies))
property <- property %>%
bind_cols(dummies)Since Tenure is binary categorical (the property is either freehold or leasehold), we can perform binary encoding on this feature.
property <- property %>%
mutate(is_leasehold = ifelse(Tenure == "Leasehold", 1, 0))Finally, to help us join the property data with our polygon data later, we need to modify the “Mukim” column in the property data column to match that in the polygon data column.
property <- property %>%
mutate(Mukim = if_else(
!startsWith(Mukim, "Bandar"),
paste("Mukim", Mukim),
Mukim
)) %>%
mutate(Mukim = toupper(Mukim))We also need to parse the number in the Transaction Price column and convert the price from Malaysian Ringgit to US dollars. Following this, we will save the dataset we have as an RDS file so that the human-readable data is accessible to the Shiny server.
property <- property %>%
mutate(
`Price_MYR` = parse_number(`Transaction Price`),
`Price_SGD` = `Price_MYR` * 0.3333333,
`Price_USD` = `Price_MYR` * 0.21
) %>%
select(-`Transaction Price`)
write_rds(property, "shiny/data/rds/property_preprocessed.rds")To visualise the results, we can now plot our work onto a map.
tmap_mode('view')tmap mode set to interactive viewing
tm_shape(adm3_jb_kulai)+
tm_polygons(alpha = 0.3) +
tm_shape(property) +
tm_dots(col = "Price_MYR",
alpha = 0.6,
style="kmeans",
palette = "YlOrBr",
n = 10,
popup.vars = c(
"Scheme Name/Area", "Road Name", "Mukim", "District",
"Month, Year of Transaction Date", "Land/Parcel Area", "Main Floor Area",
"Price_MYR", "Price_SGD", "Price_USD"
)) +
tm_view(set.zoom.limits = c(11,14)) +
tm_basemap('OpenStreetMap')